skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Wagenmaker, Andrew"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
    Free, publicly-accessible full text available April 1, 2026
  2. Free, publicly-accessible full text available December 1, 2025
  3. Free, publicly-accessible full text available December 10, 2025
  4. Free, publicly-accessible full text available December 1, 2025
  5. Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A. (Ed.)
    While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL)—the complexity of learning on the “worst-case” instance—such measures of complexity often do not capture the true difficulty of learning. In practice, on an “easy” instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the “instance-dependent” complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, Pedel, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that Pedel yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the “directions” most relevant to learning a near-optimal policy, and may be of independent interest. 
    more » « less